In this exercise we use Boston data from MASS-library. This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. Data includes 14 variables and 506 rows.
## [1] 506 14
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ black : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
| variable | description |
|---|---|
| crim | per capita crime rate by town |
| zn | proportion of residential land zoned for lots over 25,000 sq.ft. |
| indus | proportion of non-retail business acres per town |
| chas | Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) |
| nox | nitrogen oxides concentration (parts per 10 million) |
| rm | average number of rooms per dwelling |
| age | proportion of owner-occupied units built prior to 1940 |
| dis | weighted mean of distances to five Boston employment centres |
| rad | index of accessibility to radial highways |
| tax | full-value property-tax rate per $10,000 |
| ptratio | pupil-teacher ratio by town |
| black | 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town |
| lstat | lower status of the population (percent) |
| medv | median value of owner-occupied homes in $1000 |
There are some very intresting distributions fo variables in the plot matrix. Variable rad has high and low values so the plot shows that the values are consenrated either side of the plot. VAriable *
Plotted correlation matrix shows that there is some high correlation between variables:
Correlation is quite clear between industrial areas (indus) and nitrogen oxides (nox). Industry adds pollution in the area. Industry seems to correlate also with variablrs like age, dis, ras and tax.
Nitrogen oxides (nox) correlations are very similar with industry (indus)
Crime rate (crim) seems to correlate with good accessibilitty to radial highways (rad) and value property (tax).
Old houses (age) and employment centers have also something common
summary(Boston)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08204 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
All the variables are numerical so we can use scale()-function to scale whole data set.
## crim zn indus
## Min. :-0.419367 Min. :-0.48724 Min. :-1.5563
## 1st Qu.:-0.410563 1st Qu.:-0.48724 1st Qu.:-0.8668
## Median :-0.390280 Median :-0.48724 Median :-0.2109
## Mean : 0.000000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.007389 3rd Qu.: 0.04872 3rd Qu.: 1.0150
## Max. : 9.924110 Max. : 3.80047 Max. : 2.4202
## chas nox rm age
## Min. :-0.2723 Min. :-1.4644 Min. :-3.8764 Min. :-2.3331
## 1st Qu.:-0.2723 1st Qu.:-0.9121 1st Qu.:-0.5681 1st Qu.:-0.8366
## Median :-0.2723 Median :-0.1441 Median :-0.1084 Median : 0.3171
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.:-0.2723 3rd Qu.: 0.5981 3rd Qu.: 0.4823 3rd Qu.: 0.9059
## Max. : 3.6648 Max. : 2.7296 Max. : 3.5515 Max. : 1.1164
## dis rad tax ptratio
## Min. :-1.2658 Min. :-0.9819 Min. :-1.3127 Min. :-2.7047
## 1st Qu.:-0.8049 1st Qu.:-0.6373 1st Qu.:-0.7668 1st Qu.:-0.4876
## Median :-0.2790 Median :-0.5225 Median :-0.4642 Median : 0.2746
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.6617 3rd Qu.: 1.6596 3rd Qu.: 1.5294 3rd Qu.: 0.8058
## Max. : 3.9566 Max. : 1.6596 Max. : 1.7964 Max. : 1.6372
## black lstat medv
## Min. :-3.9033 Min. :-1.5296 Min. :-1.9063
## 1st Qu.: 0.2049 1st Qu.:-0.7986 1st Qu.:-0.5989
## Median : 0.3808 Median :-0.1811 Median :-0.1449
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.4332 3rd Qu.: 0.6024 3rd Qu.: 0.2683
## Max. : 0.4406 Max. : 3.5453 Max. : 2.9865
## [1] "matrix"
Scaling the data makes variables look as if they are in the same range. Variables like black and tax were before scaling hundred fold compared to some other variables.
Variable crim is the base of the new categorical variable crime.
| categories | quantile points |
|---|---|
| low | 0%-25% |
| med_low | 25%-50% |
| med_high | 50%-75% |
| high | 75%-100% |
Quantile points of the variable crim
## 0% 25% 50% 75% 100%
## -0.419366929 -0.410563278 -0.390280295 0.007389247 9.924109610
## crime
## low med_low med_high high
## 127 126 126 127
## zn indus chas nox
## Min. :-0.48724 Min. :-1.5563 Min. :-0.2723 Min. :-1.4644
## 1st Qu.:-0.48724 1st Qu.:-0.8668 1st Qu.:-0.2723 1st Qu.:-0.9121
## Median :-0.48724 Median :-0.2109 Median :-0.2723 Median :-0.1441
## Mean : 0.00000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.04872 3rd Qu.: 1.0150 3rd Qu.:-0.2723 3rd Qu.: 0.5981
## Max. : 3.80047 Max. : 2.4202 Max. : 3.6648 Max. : 2.7296
## rm age dis rad
## Min. :-3.8764 Min. :-2.3331 Min. :-1.2658 Min. :-0.9819
## 1st Qu.:-0.5681 1st Qu.:-0.8366 1st Qu.:-0.8049 1st Qu.:-0.6373
## Median :-0.1084 Median : 0.3171 Median :-0.2790 Median :-0.5225
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.4823 3rd Qu.: 0.9059 3rd Qu.: 0.6617 3rd Qu.: 1.6596
## Max. : 3.5515 Max. : 1.1164 Max. : 3.9566 Max. : 1.6596
## tax ptratio black lstat
## Min. :-1.3127 Min. :-2.7047 Min. :-3.9033 Min. :-1.5296
## 1st Qu.:-0.7668 1st Qu.:-0.4876 1st Qu.: 0.2049 1st Qu.:-0.7986
## Median :-0.4642 Median : 0.2746 Median : 0.3808 Median :-0.1811
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 1.5294 3rd Qu.: 0.8058 3rd Qu.: 0.4332 3rd Qu.: 0.6024
## Max. : 1.7964 Max. : 1.6372 Max. : 0.4406 Max. : 3.5453
## medv crime
## Min. :-1.9063 low :127
## 1st Qu.:-0.5989 med_low :126
## Median :-0.1449 med_high:126
## Mean : 0.0000 high :127
## 3rd Qu.: 0.2683
## Max. : 2.9865
Training set contains 80% of the data. 20% is in the test set.
## [1] 203 454 483 224 422 488 405 415 272 232 125 159 356 120 429 70 90
## [18] 457 277 424 57 248 312 481 498 320 455 298 102 52 88 77 31 278
## [35] 284 132 292 418 438 191 451 139 386 61 305 448 219 476 473 231 400
## [52] 247 1 15 370 453 252 357 140 301 391 273 122 146 109 244 101 56
## [69] 426 68 490 254 393 367 472 462 243 368 327 447 192 82 442 332 469
## [86] 342 353 30 313 328 309 299 485 484 169 148 311 323 4 156 374 187
## [103] 8 343 268 84 420 5 395 325 470 465 174 168 149 69 456 220 337
## [120] 121 413 266 48 3 246 468 458 234 329 387 230 364 213 256 355 46
## [137] 276 65 445 105 211 251 396 160 359 270 280 466 171 435 151 18 67
## [154] 408 199 491 14 188 37 427 66 184 315 20 74 72 103 28 218 348
## [171] 503 414 36 228 100 324 341 242 384 233 106 340 296 108 41 399 204
## [188] 158 25 162 432 75 182 322 200 223 236 354 124 249 98 63 303 215
## [205] 19 440 142 76 346 2 423 336 245 390 29 496 181 58 258 206 119
## [222] 86 330 26 96 289 180 372 163 385 389 153 195 12 388 439 170 51
## [239] 39 495 373 401 428 380 326 23 339 317 111 7 185 409 290 177 394
## [256] 378 437 260 471 350 126 80 333 492 314 371 352 486 197 81 44 331
## [273] 9 172 13 93 302 361 239 287 307 434 128 144 238 62 198 241 407
## [290] 285 397 79 467 433 304 392 318 477 382 253 358 150 449 479 176 250
## [307] 235 344 55 497 499 216 482 24 217 275 417 201 294 97 99 186 446
## [324] 210 406 22 494 504 282 240 281 179 274 141 147 129 381 114 152 50
## [341] 59 334 116 441 288 135 205 493 295 178 107 34 316 376 112 319 54
## [358] 209 403 404 286 196 42 460 506 193 474 202 255 345 35 45 60 53
## [375] 154 104 118 487 138 489 166 269 338 377 115 369 500 237 21 95 17
## [392] 6 450 183 78 267 85 505 89 87 425 464 27 10
First the linear discriminant analysis (LDA) is fitted to the train set. The new categorical variable crime is the target variable and all the other variables of the dataset are predictor variables.
After fitting we draw the LDA biplot with arrows.
## Call:
## lda(crime ~ ., data = train)
##
## Prior probabilities of groups:
## low med_low med_high high
## 0.2623762 0.2648515 0.2252475 0.2475248
##
## Group means:
## zn indus chas nox rm
## low 0.94651463 -0.9183456 -0.123759247 -0.8754984 0.44269466
## med_low -0.07850487 -0.2790180 0.022033567 -0.5628294 -0.12746575
## med_high -0.40996711 0.1550927 0.030524797 0.3389089 -0.06787385
## high -0.48724019 1.0171519 0.003267949 1.0392419 -0.41965157
## age dis rad tax ptratio
## low -0.8680064 0.9107230 -0.6936710 -0.7205813 -0.48494210
## med_low -0.3188371 0.3661931 -0.5439511 -0.4562280 -0.02651464
## med_high 0.3174192 -0.2891248 -0.3962792 -0.2926659 -0.18960230
## high 0.7791114 -0.8477139 1.6377820 1.5138081 0.78037363
## black lstat medv
## low 0.37963993 -0.75582641 0.527568166
## med_low 0.31219254 -0.14691625 -0.008139779
## med_high 0.06148452 0.01642143 0.058803298
## high -0.75752488 0.89880710 -0.676822429
##
## Coefficients of linear discriminants:
## LD1 LD2 LD3
## zn 0.112563363 0.82215648 -0.73652558
## indus -0.040362373 -0.34248126 0.38224810
## chas 0.007063488 -0.01113207 0.26493627
## nox 0.446663016 -0.68030453 -1.61410171
## rm 0.051985099 0.07864996 -0.05682784
## age 0.244295125 -0.26014105 -0.07571549
## dis -0.094134085 -0.41383725 0.10338826
## rad 3.172360636 0.89170457 -0.02517781
## tax 0.001634071 0.02359935 0.53513182
## ptratio 0.124685795 0.01143191 -0.15060181
## black -0.106181682 0.02030843 0.10161962
## lstat 0.198462827 -0.17032902 0.41705915
## medv 0.061429329 -0.36218939 -0.26170105
##
## Proportion of trace:
## LD1 LD2 LD3
## 0.9563 0.0317 0.0120
## [1] 1 4 4 3 4 4 4 4 2 3 2 3 2 2 4 2 1 4 2 4 1 2 3 4 3 3 4 2 2 1 1 2 3 1 1
## [36] 3 1 4 4 2 4 2 4 2 1 4 2 4 3 3 4 3 1 3 4 4 2 4 3 1 4 2 1 3 2 2 2 1 4 1
## [71] 2 3 4 4 4 4 2 4 3 4 1 1 4 1 4 1 1 3 3 2 3 1 3 3 3 3 3 3 1 3 4 1 2 1 3
## [106] 1 4 1 4 3 4 4 2 3 3 2 4 2 1 1 4 3 2 1 2 4 4 3 1 4 3 4 2 1 1 2 2 1 4 2
## [141] 2 2 4 3 4 2 2 3 3 4 3 3 1 4 1 2 3 1 2 4 1 2 3 3 2 2 2 3 1 1 1 4 1 3 1
## [176] 3 1 2 4 3 2 1 2 2 1 4 1 3 3 3 4 1 1 2 1 3 3 1 2 2 2 2 2 3 3 4 3 2 1 1
## [211] 4 1 2 4 3 2 1 1 3 2 2 1 1 3 2 1 1 4 3 4 4 3 1 2 4 4 3 2 2 3 4 4 4 4 2
## [246] 3 1 3 2 2 2 4 1 1 4 4 4 3 4 1 2 2 1 2 3 4 1 3 1 1 2 1 2 3 2 1 1 4 2 1
## [281] 1 4 3 4 3 2 1 2 4 1 4 1 4 4 2 4 2 4 4 2 4 3 4 4 1 2 3 1 1 3 2 2 4 3 1
## [316] 1 4 1 2 2 1 1 4 3 4 3 2 1 1 2 1 1 2 3 3 3 4 2 3 2 2 1 2 4 1 3 1 2 1 1
## [351] 2 3 2 4 2 3 1 2 4 4 1 1 2 4 1 2 4 1 1 1 3 2 2 1 3 2 2 4 3 2 3 3 1 4 2
## [386] 4 2 3 3 1 3 1 4 2 2 3 1 2 1 1 4 4 3 2
## predicted
## correct low med_low med_high high
## low 13 8 0 0
## med_low 2 15 2 0
## med_high 1 9 24 1
## high 0 0 0 27
Prediction were quite good. There was some errors in the middle of the range but classes low and especially high were good. Only one correct representative of high class was predicted to med_low class.
I’m going to calculate what is the optimal number of clusters for Boston data. First I reload and scale the data. Variables need to be scaled to get comparable distances between observation.
## crim zn indus
## Min. :-0.419367 Min. :-0.48724 Min. :-1.5563
## 1st Qu.:-0.410563 1st Qu.:-0.48724 1st Qu.:-0.8668
## Median :-0.390280 Median :-0.48724 Median :-0.2109
## Mean : 0.000000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.007389 3rd Qu.: 0.04872 3rd Qu.: 1.0150
## Max. : 9.924110 Max. : 3.80047 Max. : 2.4202
## chas nox rm age
## Min. :-0.2723 Min. :-1.4644 Min. :-3.8764 Min. :-2.3331
## 1st Qu.:-0.2723 1st Qu.:-0.9121 1st Qu.:-0.5681 1st Qu.:-0.8366
## Median :-0.2723 Median :-0.1441 Median :-0.1084 Median : 0.3171
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.:-0.2723 3rd Qu.: 0.5981 3rd Qu.: 0.4823 3rd Qu.: 0.9059
## Max. : 3.6648 Max. : 2.7296 Max. : 3.5515 Max. : 1.1164
## dis rad tax ptratio
## Min. :-1.2658 Min. :-0.9819 Min. :-1.3127 Min. :-2.7047
## 1st Qu.:-0.8049 1st Qu.:-0.6373 1st Qu.:-0.7668 1st Qu.:-0.4876
## Median :-0.2790 Median :-0.5225 Median :-0.4642 Median : 0.2746
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.6617 3rd Qu.: 1.6596 3rd Qu.: 1.5294 3rd Qu.: 0.8058
## Max. : 3.9566 Max. : 1.6596 Max. : 1.7964 Max. : 1.6372
## black lstat medv
## Min. :-3.9033 Min. :-1.5296 Min. :-1.9063
## 1st Qu.: 0.2049 1st Qu.:-0.7986 1st Qu.:-0.5989
## Median : 0.3808 Median :-0.1811 Median :-0.1449
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.4332 3rd Qu.: 0.6024 3rd Qu.: 0.2683
## Max. : 0.4406 Max. : 3.5453 Max. : 2.9865
Next I calculate the distances between observations and determinen the number of clusters.
One way to determine the number of clusters is to look how the total of within cluster sum of squares (WCSS) behaves when the number of clusters changes. WCSS was calculated from 1 to 15 clusters. The optimal number of clusters is when the total WCSS drops radically. It seems that in this case optimal number of clusters is two. However we are here to learn so I decided to analyse model with four clusters.
After determining the number of clusters I run the K-means alcorithm again.
It seems that when the data is divided to four clusters there is some clear differences in distriputions of several variables. Crim, zn, indus and blacks are variables were one can distinguish clear patterns between clusters. Some variables (rad & tax) show that sometimes 1 or 2 clusters make a clear distripution but observation of other two clusters are ambigious and there is no clear pattern to be regognised.
After loading the Boston dataset I scale it to get comparable distances.
## crim zn indus
## Min. :-0.419367 Min. :-0.48724 Min. :-1.5563
## 1st Qu.:-0.410563 1st Qu.:-0.48724 1st Qu.:-0.8668
## Median :-0.390280 Median :-0.48724 Median :-0.2109
## Mean : 0.000000 Mean : 0.00000 Mean : 0.0000
## 3rd Qu.: 0.007389 3rd Qu.: 0.04872 3rd Qu.: 1.0150
## Max. : 9.924110 Max. : 3.80047 Max. : 2.4202
## chas nox rm age
## Min. :-0.2723 Min. :-1.4644 Min. :-3.8764 Min. :-2.3331
## 1st Qu.:-0.2723 1st Qu.:-0.9121 1st Qu.:-0.5681 1st Qu.:-0.8366
## Median :-0.2723 Median :-0.1441 Median :-0.1084 Median : 0.3171
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.:-0.2723 3rd Qu.: 0.5981 3rd Qu.: 0.4823 3rd Qu.: 0.9059
## Max. : 3.6648 Max. : 2.7296 Max. : 3.5515 Max. : 1.1164
## dis rad tax ptratio
## Min. :-1.2658 Min. :-0.9819 Min. :-1.3127 Min. :-2.7047
## 1st Qu.:-0.8049 1st Qu.:-0.6373 1st Qu.:-0.7668 1st Qu.:-0.4876
## Median :-0.2790 Median :-0.5225 Median :-0.4642 Median : 0.2746
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.: 0.6617 3rd Qu.: 1.6596 3rd Qu.: 1.5294 3rd Qu.: 0.8058
## Max. : 3.9566 Max. : 1.6596 Max. : 1.7964 Max. : 1.6372
## black lstat medv clust
## Min. :-3.9033 Min. :-1.5296 Min. :-1.9063 Min. :1.000
## 1st Qu.: 0.2049 1st Qu.:-0.7986 1st Qu.:-0.5989 1st Qu.:2.000
## Median : 0.3808 Median :-0.1811 Median :-0.1449 Median :3.000
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean :2.674
## 3rd Qu.: 0.4332 3rd Qu.: 0.6024 3rd Qu.: 0.2683 3rd Qu.:3.000
## Max. : 0.4406 Max. : 3.5453 Max. : 2.9865 Max. :4.000
Original Boston dataset is now scaled and the result of K-means clustering is saved to the variable clust
Next the LDA is performed and the biplot with arrows is created
## Call:
## lda(clust ~ ., data = scaled_Boston)
##
## Prior probabilities of groups:
## 1 2 3 4
## 0.2114625 0.1304348 0.4308300 0.2272727
##
## Group means:
## crim zn indus chas nox rm
## 1 -0.3912182 1.2671159 -0.8754697 0.5739635 -0.7359091 0.9938426
## 2 1.4330759 -0.4872402 1.0689719 0.4435073 1.3439101 -0.7461469
## 3 -0.3894453 -0.2173896 -0.5212959 -0.2723291 -0.5203495 -0.1157814
## 4 0.2797949 -0.4872402 1.1892663 -0.2723291 0.8998296 -0.2770011
## age dis rad tax ptratio black
## 1 -0.6949417 0.7751031 -0.5965444 -0.6369476 -0.96586616 0.34190729
## 2 0.8575386 -0.9620552 1.2941816 1.2970210 0.42015742 -1.65562038
## 3 -0.3256000 0.3182404 -0.5741127 -0.6240070 0.02986213 0.34248644
## 4 0.7716696 -0.7723199 0.9006160 1.0311612 0.60093343 -0.01717546
## lstat medv
## 1 -0.8200275 1.11919598
## 2 1.1930953 -0.81904111
## 3 -0.2813666 -0.01314324
## 4 0.6116223 -0.54636549
##
## Coefficients of linear discriminants:
## LD1 LD2 LD3
## crim 0.18113078 -0.5012256 -0.60535205
## zn 0.43297497 -1.0486194 0.67406151
## indus 1.37753200 0.3016928 1.07034034
## chas -0.04307937 -0.7598229 -0.22448239
## nox 1.04674638 -0.3861005 -0.33268952
## rm -0.14912869 -0.1510367 0.67942589
## age -0.09897424 0.0523110 0.26285587
## dis 0.13139210 -0.1593367 -0.03487882
## rad 0.65824136 0.5189795 0.48145070
## tax 0.28903561 -0.5773959 0.10350513
## ptratio 0.22236843 0.1668597 -0.09181715
## black -0.42730704 0.5843973 0.89869354
## lstat 0.24320629 -0.6197780 -0.01119242
## medv 0.21961575 -0.9485829 -0.17065360
##
## Proportion of trace:
## LD1 LD2 LD3
## 0.7596 0.1768 0.0636
Biplot shows that variables indus, zn and medv are the most influencial separators for the clusters.
Colors of the both plots is based to four classes. It seems that K-means plot shows the different clusters more clearly than the plot that is based on the crime classification.